There is a startup selling food. We need to figure out how mobile app users behave.
Let's explore the sales funnel. Find out how a user made a purchase. Share of buyers and share of those who were "stuck" at the previous steps? What exactly are the steps?
After that, consider the results of the A/A/B experiment. Designers wanted to change fonts throughout the application, and managers feared that this would scare away users. We agreed to make a decision based on the A/A/B test results. Users were divided into 3 groups: 2 control groups with old fonts and one experimental group with new ones. Let's find out which font is the best.
There are certain advantages to creating two A groups instead of one. If the two control groups are equal, you can be confident in the accuracy of your testing. If there are significant differences between the A and A values, this will help to identify the factors that led to the skewed results. Comparison of control groups also helps to understand how much time and data will be required for further tests.
Each log entry is a user action or event.
import pandas as pd
import matplotlib.pyplot as plt
from plotly import graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"
import numpy as np
import seaborn as sns
sns.set(style="whitegrid")
colors =["#ef476f","#ffd166","#06d6a0","#118ab2","#073b4c"]
sns.set_palette(sns.color_palette(colors))
import re
import warnings
warnings.filterwarnings("ignore")
from scipy import stats as st
import math as mth
df = pd.read_csv('datasets/logs_exp.csv', sep='\t')
df.info()
df.head()
Here are the application event logs indicating the event name, user ID, information about the event time and the group(experiment) number.
df.columns = ['event_name', 'user_id', 'datetime', 'group']
df['datetime'] = pd.to_datetime(df['datetime'], unit='s')
df['date'] = df['datetime'].dt.strftime('%Y-%m-%d')
df.duplicated().sum()
df = df.drop_duplicates().reset_index(drop=True)
df.info()
df.head()
We changed the names of the columns to be convenient, removed duplicates, changed the data type in the column with the date and time information, and added a column that contains only the date.
print('There are logs for the period from {} to {}.'.format(df['datetime'].min(), df['datetime'].max()))
print('')
print('The total number of events in the log is {}.'.format(df['user_id'].count()))
print('')
print('There are {} users in the log.'.format(df['user_id'].nunique()))
print('')
print('On average, there are {} events per user.'
.format(int(df.groupby('user_id')['event_name'].agg('count').median())))
plt.figure(figsize=(15, 50))
ax = sns.countplot(y=df['datetime'].dt.strftime('%Y-%m-%d %H'), hue='event_name', data=df, dodge=False)
ax.set_title('Number of different events over a period of time')
plt.show()
There are logs for the period from 2019-07-25 04:43:36 to 08.21.2019 21:15:17. The total number of events in the log is 243713. The total number of users in the log is 7551. On average, there are 20 events per user.
Looking at the histogram by date and time, we see that we have equally complete data only for the period from 21:00 07/31/2019 to 21:00 07/08/2019.
Discard incomplete data and leave only the period from 07/31/2019 21:00:
df = df['2019-07-31 21:00' <= df['datetime']]
print('Further analysis will be carried out for data from {} to {}.'
.format(df['datetime'].min(), df['datetime'].max()))
print('')
print('The total number of events in the log for the selected period is {}.'.format(df['user_id'].count()))
print('')
print('The total number of users in the log for the selected period is {}.'.format(df['user_id'].nunique()))
print('')
print('On average, there are {} events per user.'
.format(int(df.groupby('user_id')['event_name'].agg('count').median())))
plt.figure(figsize=(15, 35))
ax = sns.countplot(y=df['datetime'].dt.strftime('%Y-%m-%d %H'), data=df, hue='group')
ax.set_title('Number of events for each of the groups')
plt.show()
Further analysis will be carried out for data from 21:00:57 on July 31, 2019 to 21:15:17 on August 07, 2019.
Discarding incomplete data, we lost no more than one percent of events, and the histogram also shows that we have data from all groups.
Let's see what events are in the logs, how often they happen. Let's sort the events by frequency.
events = df.groupby('event_name')['user_id'].agg(['count', 'nunique']).reset_index()
events.columns =['event_name', 'n_events', 'n_users']
n_users = {'all': df['user_id'].nunique(),
246: df[df['group']==246]['user_id'].nunique(),
247: df[df['group']==247]['user_id'].nunique(),
248: df[df['group']==248]['user_id'].nunique(),
'246+247': df[(df['group']==246) | (df['group']==247)]['user_id'].nunique()}
plt.figure(figsize=(14, 7))
order = events.sort_values('n_events', ascending=False).reset_index(drop=True)['event_name']
ax = sns.barplot(y='event_name', x='n_events', order = order, data=events)
ax.set_title('Events by frequency')
for i in ax.patches:
if i.get_width() > 20000:
ax.text(i.get_width()-11000, i.get_y()+0.5,
str(int(i.get_width())), fontsize=20, color='white')
else:
ax.text(i.get_width()+30, i.get_y()+0.5,
str(int(i.get_width())), fontsize=20, color='grey')
plt.show()
event_pivot=df.pivot_table(index=['event_name','group'], values='user_id', aggfunc=['count', 'nunique']).reset_index()
event_pivot.columns= ['event_name','group', 'n_events', '1', '2', 'n_users']
event_pivot = event_pivot.loc[:,['event_name','group', 'n_events', 'n_users']]
plt.figure(figsize=(14, 7))
ax = sns.barplot(y='event_name', x='n_events', order = order, hue='group', data=event_pivot)
ax.set_title('Events by frequency in the groups')
for i in ax.patches:
if i.get_width() > 9000:
ax.text(i.get_width()-2500, i.get_y()+0.2,
str(int(i.get_width())), fontsize=15, color='white')
else:
ax.text(i.get_width()+30, i.get_y()+0.2,
str(int(i.get_width())), fontsize=15, color='grey')
plt.show()
Let's count how many users have completed each of these events. Let's sort the events by the number of users. Let's calculate the proportion of users who completed the event at least once.
plt.figure(figsize=(14, 7))
ax = sns.barplot(y='event_name', x='n_users', order = order, data=events)
ax.set_title('Share of users who have passed the event at least once')
for i in ax.patches:
if i.get_width() > 3000:
ax.text(i.get_width()-1300, i.get_y()+0.5,
str(int(i.get_width()))+' ({:.1%})'.format(i.get_width() / n_users['all']), fontsize=20, color='white')
else:
ax.text(i.get_width()+30, i.get_y()+0.5,
str(int(i.get_width()))+' ({:.1%})'.format(i.get_width() / n_users['all']), fontsize=20, color='grey')
plt.show()
plt.figure(figsize=(14, 7))
ax = sns.barplot(y='event_name', x='n_users', order = order, hue='group', data=event_pivot)
ax.set_title('Share of users in groups that have passed the event at least once')
for i,v in enumerate(ax.patches):
if i < 5:
if v.get_width() > 1000:
ax.text(v.get_width()-310, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[246]),
fontsize=15, color='white')
else:
ax.text(v.get_width()+10, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[246]),
fontsize=15, color='grey')
if 5 <= i < 10:
if v.get_width() > 1000:
ax.text(v.get_width()-310, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[247]),
fontsize=15, color='white')
else:
ax.text(v.get_width()+10, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[247]),
fontsize=15, color='grey')
if i >= 10:
if v.get_width() > 1000:
ax.text(v.get_width()-310, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[248]),
fontsize=15, color='white')
else:
ax.text(v.get_width()+10, v.get_y()+0.2,
str(int(v.get_width()))+' ({:.1%})'.format(v.get_width() / n_users[248]),
fontsize=15, color='grey')
plt.show()
Suppose the user
Using the event funnel, let's calculate what proportion of users will proceed to the next step of the funnel (excluding the sequence of events):
order = (events[events['event_name'] != 'Tutorial']
.sort_values('n_events', ascending=False)['event_name']
.reset_index(drop=True))
groups = [246, 247, 248]
simple_funnel = {}
for group in groups:
simple_funnel[group] =[]
for event in order:
simple_funnel[group].append(df[(df['group'] == group) & (df['event_name'] == event)]['user_id'].nunique())
fig = go.Figure()
for i, group in enumerate(groups):
fig.add_trace(go.Funnel(
name = str(group),
y = (event_pivot[(event_pivot['group'] == group) & (event_pivot['event_name'] != 'Tutorial')]
.sort_values('n_users', ascending=False)['event_name']),
x = (event_pivot[(event_pivot['group'] == group) & (event_pivot['event_name'] != 'Tutorial')]
.sort_values('n_users', ascending=False)['n_users']),
textposition = "inside",
textinfo = "value+percent previous",
marker = {"color": colors[i]},
connector = {"fillcolor": '#bde0eb'},
insidetextfont = {'color': 'white', 'size': 14}))
fig.show()
Looking at this funnel, we can see that the majority of users (37% for group 246 and 39% for groups 247 and 248) leave us looking at the main screen (MainScreenAppear) and without even going to the product catalog (OffersScreenAppear). This may indicate that the interface of the main page is inconvenient or incorrectly displayed on some devices, it would be good to request data on device models and check if there is a dependence on the flow of users from devices.
Less than half of users switch from the first event to payment (49% / 46.7% / 47.4% for groups 246/247/248, respectively).
Now let's take a look at the funnel considering the intended sequence of events:
users = df[df['event_name'] != 'Tutorial'].pivot_table(
index=['user_id', 'group'],
columns='event_name',
values='datetime',
aggfunc='min').reset_index()
funnel ={}
for group in groups:
funnel[group] =[]
step_1 = (users['group']==group) & (~users['MainScreenAppear'].isna())
step_2 = step_1 & (users['OffersScreenAppear'] > users['MainScreenAppear'])
step_3 = step_2 & (users['CartScreenAppear'] > users['OffersScreenAppear'])
step_4 = step_3 & (users['PaymentScreenSuccessful'] > users['CartScreenAppear'])
funnel[group].append(users[step_1].shape[0])
funnel[group].append(users[step_2].shape[0])
funnel[group].append(users[step_3].shape[0])
funnel[group].append(users[step_4].shape[0])
fig = go.Figure()
for i, group in enumerate(groups):
fig.add_trace(go.Funnel(
name = str(group),
y = (event_pivot[(event_pivot['group'] == group) & (event_pivot['event_name'] != 'Tutorial')]
.sort_values('n_users', ascending=False)['event_name']),
x = funnel[group],
textposition = "inside",
textinfo = "percent previous",
constraintext='outside',
textangle = 90,
marker = {"color": colors[i]},
connector = {"fillcolor": '#bde0eb'},
insidetextfont = {'color': 'white'}))
fig.show()
Many, but not all, follow the suggested sequence of steps. For comparison: from the first event to payment along this path, it was 5.9% / 5.9% / 6.7% for groups 246/247/248, respectively (in the experimental group with a changed font, this pattern is slightly more common).
This tells us that there are several ways to get to the payment in the application, for example, the ability to instantly pay for an item without going to the cart, etc.
print('For the A/A/B test, users were divided into 3 groups: '
'2 controls (246 and 247) with old fonts and one experimental (248) with new ones.')
print('')
for group in groups:
n_users
print('There are {} users in group {}'.format(n_users[group], group))
print('')
There are 2 control groups for the A/A experiment to check the correctness of all mechanisms and calculations. Let's see if the statistical tests find the difference between samples 246 and 247.
simple_funnel = pd.DataFrame(simple_funnel)
simple_funnel['246+247'] = simple_funnel[246] + simple_funnel[247]
simple_funnel['event_name'] = order
def z_value_diff(first_group, second_group, alpha, color):
for i in simple_funnel.index:
alpha = alpha
# share of success in the first group:
p1 = simple_funnel[first_group][i] / n_users[first_group]
# share of success in the second group:
p2 = simple_funnel[second_group][i] / n_users[second_group]
# success rate in the combined dataset:
p_combined = ((simple_funnel[first_group][i] + simple_funnel[second_group][i]) /
(n_users[first_group] + n_users[second_group]))
# difference in aspect ratio in datasets:
difference = p1 - p2
# counting statistics in st. deviations from the standard normal distribution:
z_value = difference / mth.sqrt(p_combined * (1 - p_combined) *
(1/n_users[first_group] + 1/n_users[second_group]))
# set the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('{} p-value: {}'.format(simple_funnel['event_name'][i], p_value))
if (p_value < alpha):
print("Rejection of the null hypothesis: there is a significant difference between the shares")
else:
print("It was not possible to reject the null hypothesis, "
"there is no reason to consider the shares to be different")
print('')
fig = go.Figure()
for i, group in enumerate([first_group, second_group]):
fig.add_trace(go.Funnel(
name = str(group),
y = order,
x = simple_funnel[group],
textposition = "inside",
textinfo = "value+percent initial",
marker = {"color": colors[i+color]},
connector = {"fillcolor": '#bde0eb'},
insidetextfont = {'color': 'white', 'size': 14}))
fig.show()
z_value_diff(246,247,0.05,0)
For none of the events, the difference was not significant; both of these groups can be considered control groups.
Let's do the same with the group with the changed font. Let's compare the results with each of the control groups separately for each event.
Groups 246 and 248:
z_value_diff(246,248,0.05,1)
No significant differences were found between the 246 control and experimental groups.
Groups 247 and 248:
z_value_diff(247,248,0.05,2)
No significant differences were found between the 246 control and experimental groups.
Now let's compare the control 246 and 247, combined into one group, with the experimental 248:
z_value_diff('246+247',248,0.05,3)
Comparison of the results with the pooled control group also showed no significant difference.
Analyzing the histogram by date and time, we decided to discard incomplete data and leave only the period from 21:00 on July 31, 2019.
When we looked at the event funnel, we found that less than half of users switch from the first event to payment (49%/46.7%/47.4% for groups 246/247/248, respectively). Only 98.5% of all users have opened the main page of the application at least once (perhaps other users were unable to get to the main page due to errors / incorrect operation of the application). A total of 4597 users have opened a page with a product catalog at least once (39% of users have not even seen the product catalog, you need to find out the reason, perhaps the application does not work correctly on all devices).
We ran 16 statistical hypothesis tests with a significance level of 0.05 (12 of them tested the difference between the control groups and the changed font group), and none of them found a significant difference.
At a significance level of 0.1, only one of the tests would show a significant difference between the 246 control group and the experimental group in the proportion of users going to the cart (CartScreenAppear), but this difference would not be in our experimental group's favor. But with a significance level of 0.1, every tenth time you can get a false result, so it is worth applying the initially selected significance level of 0.05.
Based on the results of this A/A/B experiment, we can judge that changing the font did not have a significant impact on user behavior. Which could be considered a success, since the goal was to find out if the changes would deter users. At the same time, taking into account the results of the experiment, if the change in the font is not dictated by problems in the operation of the application, the font does not need to be changed.